Skip to content

Comments

fix: use iconv_strlen in kanji mode#200

Merged
DASPRiD merged 4 commits intoBacon:mainfrom
vlakoff:shift-jis
Nov 16, 2025
Merged

fix: use iconv_strlen in kanji mode#200
DASPRiD merged 4 commits intoBacon:mainfrom
vlakoff:shift-jis

Conversation

@vlakoff
Copy link
Contributor

@vlakoff vlakoff commented Nov 11, 2025

This fixes the unreadable QR code issue in Kanji mode when using Shift-JIS encoding (see #172).

PR #173 attempted to resolve this by forcing Byte mode, which worked but sacrificed the efficiency of Kanji mode.

In a comment on #173, I discovered the root cause: strlen() was incorrectly used to count characters, leading to misalignment in encoding.

Replacing it with iconv_strlen($content, 'utf-8') ensures accurate character counting for multibyte strings, preserving proper Kanji mode behavior and producing valid QR codes.

@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 11, 2025

Just a couple of thoughts:

  • I had a look at chooseMode. It currently iterates over the string on a byte basis. While this works in practice, a proper (although presumably slower) multibyte-aware iteration might be considered—though it’s not straightforward to implement using iconv alone (i.e., without relying on mbstring).
    Edit: A multibyte-aware loop would bring no benefit in this case, because the loop exits as soon as it encounters a byte that is neither numeric nor alphanumeric—both of which operate on single-byte characters.

  • In the Encoder::encode() at hand, strlen() is sufficient when the mode is Mode::ALPHANUMERIC() or Mode::NUMERIC(). So iconv_strlen() is only strictly necessary for Mode::KANJI(), at the moment.
    Therefore, there might be a small optimization opportunity here: keeping strlen() for the Mode::ALPHANUMERIC() and Mode::NUMERIC() modes could avoid the overhead of iconv_strlen() when it's not needed.
    Edit: I've added a commit for this. strlen() is O(1), which is great.

@vlakoff vlakoff force-pushed the shift-jis branch 3 times, most recently from 1e7eea5 to 8bf099a Compare November 12, 2025 10:35
@vlakoff vlakoff changed the title Fix Kanji mode QR code generation fix: Kanji mode QR code generation Nov 14, 2025
@DASPRiD
Copy link
Member

DASPRiD commented Nov 15, 2025

Please rebase against main to get the tests working.

Fixes unreadable QR codes in Kanji mode with Shift-JIS encoding (see Bacon#172).

PR Bacon#173 worked around the issue by forcing Byte mode, but that lost Kanji mode’s efficiency.

The actual cause was using strlen() to count characters.

Replacing it with iconv_strlen($content, 'utf-8') ensures correct character count and restores proper Kanji encoding.
@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 16, 2025

Rebased. Please let me know what you would like to see improved (including commit messages) before considering this for merging.

@DASPRiD
Copy link
Member

DASPRiD commented Nov 16, 2025

Looks like there's something wrong with your match syntax?

@vlakoff vlakoff force-pushed the shift-jis branch 2 times, most recently from 39a90d6 to 00938aa Compare November 16, 2025 15:51
@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.81%. Comparing base (bd2370d) to head (394a4fb).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #200      +/-   ##
============================================
+ Coverage     70.58%   70.81%   +0.22%     
+ Complexity      995      994       -1     
============================================
  Files            49       49              
  Lines          3182     3169      -13     
============================================
- Hits           2246     2244       -2     
+ Misses          936      925      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DASPRiD
Copy link
Member

DASPRiD commented Nov 16, 2025

LGTM, thanks! :)

@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 16, 2025

Looks like there's something wrong with your match syntax?

It's a shame, multiple expressions cannot be on separate lines…

Furthermore, default cannot be grouped with other expressions, i.e. the following is not possible:

Mode::KANJI(), default => iconv_strlen($content, 'utf-8'),

@DASPRiD
Copy link
Member

DASPRiD commented Nov 16, 2025

Actually, I just looked through the other modes – are we certain that they all require iconv_strlen?

@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 16, 2025

I haven’t really dug into the other modes, so I can’t speak with confidence there. My changes were scoped specifically to Kanji mode.

Alternatively, it could look like this:

$numLetters = match ($mode) {
    Mode::BYTE()  => $dataBits->getSizeInBytes(),
    Mode::KANJI() => iconv_strlen($content, 'utf-8'),
    default       => strlen($content),
};

That said, I can’t say for sure whether this covers all cases correctly.

@DASPRiD
Copy link
Member

DASPRiD commented Nov 16, 2025

Well, as chooseMode in this context will only return those known modes, how about this:

Match on all expected modes coming from chooseMode, and in the default case throw an exception about an unexpected mode.

@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 16, 2025

I had thought about this approach too (explicitly matching Mode::KANJI() and omitting the default branch).

I had discarded it because any hypothetical future mode addition could suddenly cause breakage, hence the idea of a default/fallback

But overall, that could still be an acceptable approach.

Note that match already throws an UnhandledMatchError on unmatched cases. But I suppose you would rather want to explicitly throw an exception.

@DASPRiD
Copy link
Member

DASPRiD commented Nov 16, 2025

oh, UnhandledMatchError is totally fine, I didn't know about that. But if somebody modifies chooseMode and adds new modes, they would then be warned by the match if it isn't handled explicitly, which is exactly what we want :)

For NUMERIC and ALPHANUMERIC modes, strlen() is sufficient since these modes only
operate on single-byte characters. strlen() runs in O(1) time and avoids the
overhead of iconv_strlen().
@vlakoff
Copy link
Contributor Author

vlakoff commented Nov 16, 2025

Oh, right — encode() calls chooseMode() right at the beginning, which ensures the possible values for $mode.

And since chooseMode() is a private method only invoked there, its sole purpose is to simplify the logic within encode(), so the two are tightly coupled.

I’ve updated the PR accordingly, and also took the opportunity to rewrite appendBytes() using a match for consistency.

Nicer, and more consistent with some other code in the encode() method.

Invalid modes are really not expected here, as this private method is called only by encode(),
and the mode is determined right at the beginning of encode() by calling chooseMode().

Though, if an invalid mode ever comes here, the match would throw an UnhandledMatchError.
@DASPRiD DASPRiD changed the title fix: Kanji mode QR code generation fix: use iconv_strlen in kanji mode Nov 16, 2025
@DASPRiD DASPRiD merged commit 38d3c55 into Bacon:main Nov 16, 2025
9 checks passed
@vlakoff vlakoff deleted the shift-jis branch November 17, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants